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® instruction buffer cyetam for a digital computer. 

© An instruction buffer for a digital computer con- 
trols the flow of Instruction abeam to an instruction 

a decoder (32). As each instruction la consumed* a 
shifter (70) removes the consumed bytes and reposi- 
tions the remaining bytes into the lowest order posl- 
Jj boned The byte positions left empty are fillad by 
^Instruction stream bytes retrieved from one of a pair 
of prefetch buffers (64, 96) or from a virtual Instruc- 
gtton cache (28). One prefetch buffar (08) Is filled 
pi from the insertion cache (28) after being emptied* 
but prior to thoee parflcular bytes being requested to 
the Instruction de co dsr (32). The two4evel 
flL prefetching aflows the retatlvoty slow process of 
III cache access to be performed during noncritlcal 



on 



J— 



BEST AVAILABLE COPY 



EP0380 8S4A2 



INSTRUCTION BUFFER 8YSTEM FOR A DOTAL COMPUTER 



This invention relates generally to an instruc- 
tion buffer system and to a virtual instruction cache 
(VIC) of a high-speed digital computer. 

In the field of high speed computers, most 
advanced computers pipeline the entire sequence 
of instruction activities. A prime example is the 
VAX 8600 computer manufactured and sold by 
Digital Equipment Corporation. 111 Powdermitt 
Road. Maynard MA 97154-1418. The instruction 
pipeline for the VAX 8800 Is described in T. Pos- 
sum et al. "An overview of the VAX 8800 System*. 
Digital Technical Journal , No. 1. August 1885. pp. 
8-23. Separate pipeline stages are provided for 
instruction fetch, instruction decode, operand ad- 
dress generation, operand fetch, instruction ex- 
ecute, and result store. 

To make effective use of this pipelining capa- 
bility, it is desirable to keep each stage of the 
pipeline occupied, performing its intended function 
on the next instruction to be executed. In order to 
do this, the instruction fetch stage must retrieve an 
instruction and pass it to the next stage between 
each transition of the system dock. Otherwise, 
such a disruption in the instruction stream causes 
the pipeline to drain, n e ce ss itati n g a time-consum- 
ing restart of the entire pipefine. Of course, the 
purpose of the pipeline is to increase the overall 
speed of the computer. Thus, it is highly advanta- 
geous to avoid these situations where the pipeline 
is interrupted. 

However, the instruction set employed In some 
computers is of the variable length type, thereby 
forcing the instruction buffer to have added com- 
plexity, fn other words, until the Instruction 
(opcode) is decoded, the instruction buffer does 
not "know" how many of the subsequent bytes of 
the instruction stream belong with the current in- 
struction. Therefore, the instruction buffer can only 
respond by loating a preselected number of bytes 
of the instruction stream, which may or may not 
include an entire instruction. The instruction de- 
coder will only consume those bytes associated 
with the immediate instruction. Thereafter, the in- 
struction buffer must determine how many of the 
present bytes were used by the decoder, shift the 
unused bytes into the lowest order locations, and 
then fill the empty buffer locations with subsequent 
bytes of the instruction stream. 

Reference to the main memory to retrieve 
these subsequent bytes of instruction stream nec- 
essarily involves multiple dock cycles. To avoid 
accessing main memory, many digital computers 
include a high speed cache between the process- 
ing unit and the main memory. Access to this 
cache takes only a small number of cycles of the 



processor's clock but often involves translating vir- 
tual addresses to physical addresses. To further 
accelerate the access to the instruction stream, 
some systems dedicate a cache solely to store the 
s instructions. The access to this -instruction cache" 
often does not entail translating from virtual to 
physical addresses as the instructions are stored 
under their virtual addresses. This access to the 
Instruction stream in a high speed virtual instruction 
to cache may only involve one cyde of the proces- 
sor's dock. The virtual instruction cache, however, 
contains onfy a portion of the main memory, each 
reference to the virtual instruction cache involves 
comparing the requested address with the desired 
rs address to first determine if the desired instruction 
stream is present and then retrieving the requested 
instruction stream. Therefore, owing to the variable 
length nature of the instruction set the instruction 
buffer cannot predict whether a r e ference to the 
20 VfC wiB be required by the instruction currently 
being decoded. 

To prevent numerous refe ren ce s to the virtual 
instruction cache, a prefetch buffer is provided to 
maintain a preselected number of the subsequent 
25 bytes of instruction stream which are expected to 
be used by the instruction decoder. This process 
forestalls the inevitable reference to the virtual in- 
struction cache. 

Since the virtual instruction cache contains onty 
so a portion of the Instruction stream, refills to the 
instruction buffer can result in "misses" In the 
virtual Instruction cache, which require fetches from 
the main memory. These main memory fetches 
generally require many clock cycles, thereby inter- 
as ruptlng the pfpeBne. 

To ensure that the instruction pipeline of a 
digital computer remains full to provide for fast and 
efficient execution of the instructions, an instruction 
buffer includes first and second prefetch buffers for 
40 storing a preselect e d number of subsequent bytes 
of Instruction stream. The first prefetch buffer is 
independently addressable to retrieve a selected 
number of sequential bytes contained therein. 
Means are provided for refilling the decoder with 
<s the number of sequential bytes of instruction 
stream corresponding to the number of bytes cur- 
rently being decoded. The reffll means retrieves 
the instruction stream bytes from the first prefetch 
buffer sequentially and sets a "vafid bit" corre- 
60 spondfng to each byte of Instruction stream re- 
trieved. The second instruction buffer need onfy 
contain an vafid or ail invafid bytes, and therefore 
only one vafid bit need be held for the second 
instruction buffer. The first prefetch buffer Is refilled 
with the preseiectBd number of subsequent instruc- 
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Hon stream bytes In respo n se to ail of the valid bits 
corresponamg n eacn oyie at mo instruction 
stream contained therein being dear, ShmBarty, the 
Second pretax* buffer (s refilled wtth the proseleo- 
ted number of subseq u ent Instruction stream bytes 
wnon ii oocomes sropry. 

Other objects and advantages of the Invention 
wO become apparent upon reading the following 
detailed description and upon reference to the 

i |.~ ii i i J. i i i I* *-■-»- 

orasnQs ui wntcrt 

FIG. 1 is a top level block diagram of a 
portion of a central processing unit and a s so c iat e d 
memory] 

Fig. 2 is a ftmctfonal diagram of the pipeline 
processing of a tongword ADD operand; 

FB. 3188 Mock tfagram of the virtual 
Instruction cache; 

FIG. 4 Is a general block dtagram of the 
Instruction buffer Interfaced with the virtual Instruc- 
tion cache; 

FIG. 5 la a detailed block cfiagram of the 
instruction buffer and the Interface to the Instruction 

FIG. 6 la a schematic dtagram of the shifter 
of tho instruction buffer, 

FWL 7 la a schematic dtagram of the rotator 
of the Ins tr u ct ion buffer; 

FIG. 8 la a schematic diagram of the merge 
muM ptaxer of the Instruction buffer, and 

FIG. 9 is a block diagram of the two-unit 
vafld block sine strains of the virtual instruction 
cache. 

While the Invention is susceptible to various 
modifications and alternative forms* specific em* 
bo dm ents thereof have been shown by way of 
example In the drawings and wffl herein be de* 
scribed In dstaJL It should be understood, however, 

tfl^ U Lm _i_itJ tiil.m ,tm d •** ft nil ft til ■ tmt - mMtm - * - Hi ■ 

mar it is nor rosnoao id umn roe invention to tne 
ps rt c ut ar forma dsctosod, but on the contrary, the 
intention Is to const sfl modifications, equivalents, 
and attemallvoe fallng wfthin the spirit and scope 
of the Invention ss defined by the appended 
l mule. 

Turning now to the drawings. FIGURE 1 la a 
top level block dtagram of a portion of a pipelined 
computer system 10, The system 10 inductee at 
least one central pro cessin g unit (CPU) 12 having 
access to main memory 14. it should be under- 
stood that addHonal CPUs could be used in such a 
system by sharing the mafn memory 14. 

Inskfe tf» CPU 12, the execution of an kKflvto- 
ual Instruction l« broken down Into muMipte smaSer 
casxs. iness tssxs are penormeo oy geoicapQ, 
separate, independent functional units that are op* 
nrrnseo tot inaz purpose. 

Although each instruction ultimately performs a 
different operation* many of the smaller tasks into 
wmcn eacn wsou cs on is Drosen are common to as 



instructions. Generally, the foHowtng stspe are per- 
formed during the execution of an Instruction: in- 
struction fetch. Instruction decode, operand fetch, 
execution, and result store* Thus, by the use of 
s dedfcated hardware stages, the steps can be over- 
lapped, thereby increasing the total Instruction 
throughput 

The data path through the pipeflne includes a 
respective set of registers tar transferring the re- 

10 suite of each pipeline stage to the next pipeline 
stage. These transfer registers are docked in re- 
sponse to a common system dock. For example, 
during a first dock cycle, the first instruction to 
fetched by hardware dedicated to Instruction fetch. 

rs During the second dock cycle, the fetched instruc- 
tion Is transferred and decoded by Instruction de- 
code hardware, but at the same time, the next 
instruction is fetched by the Instruction fetch hard- 
ware. During the third dock cycle, each Instruction 

» Is shifted to the next stage of the pipeine and a 
new i n struction Is fetc h ed. Thus, after the pipoBne 
la filed, an instruction wifl be completely executed 
at the end of each dock cycle. 

TWa process is analogous to an asaernbty Bne 

as in a manufacturing environment Each worker la 
dodteate d to performing a single task on every 
product that pe ase s through hie or her work stage. 
As each task is performed the product comes 
closer to completion. At the final stage, each time 

so the worker performs Ms or her assigned task a 
completed product roOs off the assembly Bne. 

As shown m Ftti 1, each CPU 12 Is partitioned 
into at least three tanctional unite: the memory 
access untt 18. (he instruction unit 18, and the 

ss exec ut ion unit 20. 

The memory access unit 18 includes a main 
cache 22 which, on an average basis, enables the 
Instruction and execution unite 18, 20 to process 
data at a faster rate than the access time of the 

40 main memory 14 TNs cache 22 includes means 
for storing selectsd predefined hfocfcs of data el©* 
ments, means for receiving reque st s from the In- 
struction unit 18 via a translation buffer 24 to ac- 
cess a spectneo can element means tor cnectong 

m wnsmsr me oas element is si e dock stored m tne 
cache 22, and means operative when data for the 
block Including the specified data element Is not so 

■ ilin i ii ■ 1 1 fin. — t| ■ mrt n nlfl. d *-* * ■ - * ' ' 1_ 

stored rar reading tne specified mock or data in tne 
cache 22. to other words, the cache provides a 

so "window" into the main memory, and contains data 
Italy to be needed by the Instruction and execu* 
tlon units 18, 20* The organization and operation of 
a aknfhr cache and translation buffer are further 
described in Chapter 11 of Lsvy and Eckhouse, Or, 

ss Computer Programming and Architecture, The 

387PB0). 

If a data element needed by the instruction and 
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execution units 18, 20 is not found in the cache 22, 
then the data element is obtained from the main 
memory 14. but in the process, an entire block, 
inclutfng additional data, is obtained from the main 
memory 14 and written into ther cache 22. Due to 
the principle of locality in time and memory space, 
the next time the instruction and execution units 
desire a data element, there Is a high degree of 
likelihood that this data element will be found in the 
block which includes the previously addressed data 
element Consequently, it is probable that the 
cache 22 will already include the data element 
required by the instruction and execution units 1ft 
20. In general, since the cache 22 is accessed at a 
much higher rate than the main memory 14, the 
mam memory 14 can have a propor ti onally slower 
access time than the cache 22 without substantially 
degradng the average perfor manc e of the com* 
puter system 10. Therefore, the main memory 14 
is constructed of slower and toss expensive mem- 
ory elements. 

The translation buffer 24 is a high speed asso- 
ciative memory which stores the most recently 
used viftuaJ-to-physlcal address translations. In a 
virtual memory system, a reference to a single 
virtual address can cause several memory referen- 
ces before the desired information is made avail- 
able. However, where the translation buffer 24 is 
used, translation is reduced to simply finding a 
"hit" in the translation buffer 24. 

The instruction wilt 18 Includes a program 
counter 28 and a virtual instruction cache (VIC) 28 
for fetching instructions from the main cache 22. 
The program counter 26 preferably addresses vir- 
tual memory locations rather than the physical 
memory locations of the main memory 14 and the 
cache 22. Thus, the virtual address of the program 
counter 26 must be translated into the physical 
address of the main memory 14 before in structions 
can be retrieved. Accordingly, the contents of the 
program counter 28 are transferred to the memory 
access unit 16 where the translation buffer 24 per- 
forms the address con v ersion. The instruction is 
retrieved from its physical memory location in the 
cache 22 using the converted address. The cache 
22 delivers the instruction over data return lines to 
the VIC 28. 

Generally, the VIC 28 contains prestored 
instructions at the addresses specified by the pro- 
gram counter 26. and the addressed instructions 
are available immediately for the transfer into an 
instruction buffer (1 BUFFER) aa From the buffer 
30. the addressed instructions are fed to an Instruc- 
tion decoder 32 which decodes both the opcodes 
and the specifiers. An operand p roces sin g unit 
(CPU) 34 fetches the specified operands and sup- 
plies them to the execution unit 20. 

The CPU 34 also produces virtual addresses. 



In particular, the OPU 34 produces virtual address- 
es tar memory source (reed and destination (write) 
operands. For the memory read operands, the OPU 
34 deivers these virtual addresses to the memory 
a access unit 16 where they are translated to phys- 
ical add r esse s . The physical memory locations of 
the cache 22 are then accessed to fetch the 
operands for the memory source operands. 

In each instruction, the first byte contains the 
ro opcode, and the following bytes are the operand 
specifiers to be de coded. The first byte of each 
specifier indicates the addressing mode for that 
specifier. This byte is usually broken in halves, with 
one-haff specifying the addressing mode and the 
rs other half specifying a register to be used for 
addressing. The instructions preferably have a vari- 
able length, and various types of specifiers can be 
used with the same o pcode, as di s c losed In Strec- 
ker at aL U.S. Patent 4241397 issued December 
a> 23.1660. 

The first step in processing the instructions la 
to decode the opcode portion of the Instruction, 
ins nrsi portion or eacn uisuucoon consists or its 
opcode which specifies the operation to be per* 
25 formed in the instruction, and the number and type 
of specifiers to be used. Deco din g Is accomplished 
using a table-look-up technique In the instruction 
decoder 32. Later, the execution unit 20 performs 
the specified operation by executing prestored 
so microcode, beginning at a predetermined starting 
address for the specified operation. Also, the de- 
coder 32 determines where source-operand and 
destination-operand specifiers occur In the instruc- 
tion and passes these specifiers to the OPU 34 for 
38 preprocessing prior id e x ec ut ion or me instruction. 
A preferred Instruction decode r for use with the 
refill method and apparatus of the present Invention 
Is described In the above referenced D. Rte et a!. 
U.a patent application Serial No. .filed 
40 and entitled "Decoding Multiple Specifiers m a 
Variable Length Instruction Architecture," incorpo- 
rated herein by reference. 

After an instruction has been decoded, the 
OPU 34 parses the operand specifiers and com- 
48 putes their effective addresses; this process in- 
volves reading GPRS and possibly modifying the 
QPR contents by a u t oto cre m ent ln g or auto- 
decrementing. The operands are then fetched from 
those effective addresses and passed on to the 
so e xec uti on unit 20, which executes the instruction 
and writes the result into the destination identified 
by the destination pointer for that instru cti on* 

Each time an instruction Is p as sed to the ex- 
ecution unit 20, the Instruction unit 16 sends a 
58 microcode dispatch address and a set of pointers 
for (1) the location in the execution unit register file 
where the source operands can be found, and (2) 
the location where the results are to be stored. 
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WHMn the axscuttoo unit 20, 8 set of queu es 38 
Includes a forte queue for storing the microcode 

^ItfMBUB^B^B^LW A^^bHkJkA 4* — — - tffe^B^^Bfa^utfB^ At 1-BWi- JL^eP JiAgij^ 

QJSPSECm c O O F p88i B 80UTC8 POP Hps QU6U0 101 JhUI 

tng the sourci>operand locations, and a destination 

11 11 I ■ t n ■ ^--^ * - - - * - ftiwr In alt 1 i^l ■■■ Inn alln .i 

opener Queue ior soonnD mo ooaonBOun locsoon* 
Each of these queues is a FIFO buffer capable of 

h_n.lrfT.i_n jlin rl^r. fc^ ■—■it.lr.l-i tjimtui iwttnn* 

ROfuaig me qbd tot muropto insuucoons. 

The execution unit 20 also Includes a source 
ust do, wrucn ts a muro-porwi register tne oontau> 
ihg a copy of the GPR8 and a list of source 
operands. Thus, entries In the source p oint er 
queue wflJ either point to QPR loc at ions for register 
operands, or point to the source list for memory 
and literal operands* Both the memory a cc ess unit 
16 and the instruction unit 18 write entries In the 
source Dst 38, and the execution unS 20 reads 
operands out of the source Oat 38 ae needed to 
execute the instructions. For executing instructions, 
the execution unit 20 inclu des an instruction Issue 
unit 40, a microcode execution unR 42, an 
arithmetic and logic urtft (ALU) 44, and a retire unit 
46. 

The present invention Is particularly useful with 
pipelined processors. As dis c us sed above, in a 
pipelined processor, the processor's instruction 

f n if nil Bl _»«aa > *- — - — - * — - *. ~ - 

reicn naraware may oe Tetcntno one tnstrucoon 
whfle other hardware is decodin g the operation 
cooe a secono tnauuenon, isvciDng uie operanos 
of a third instruction, e x ecuting a fourth Instruction, 
ana swing me proc osss o oaxa or a nnn tnstrucoon. 
FKS. 2 DIustiatss a pfpeine for a typical instruction 
such as: 

ADDURO3~120»UR2 

1 ma is a long woro aoonion uatng me cnaptaoemeni 
mode of addressing. 

tne nrsc stags or me ptpeinea exec ut ion or 
this Instruction, the program count (PC) of the 
instruction la created; this Is usually accompfshsd 
enner oy mcrsmennjig me program cournar en 
horn the previous instruction, or by using the target 
address of a branch instruction. The PC le then 
used to access VIC 28 In the second stage of the 
pipeline* 

' In the third stage of the pipeline! the Instruction 
data is available from the cache 22 tor use by the 
Instruction decoder 32, or to be l oa d e d into the 
IBUFFER 3a The instruction decoder 32 decodes 
the opco d e end the three specifiers In a single 
cycle, as wH be described In more detail below. 
The RO and R2 numbers are passed to the ALU 
44, and the R1 number along with the byte dis- 
placement b sent to the OPU 34 at the end of the 

In stags four, the OPU 34 reads the contents of 
Its QPR reg i s te r We at location R1, adds that value 
to the specified displacement (12), and sands the 

til , mm n J,! ,|| fc*. jj. - j 'nHnn ti i fin t *%m% lam. iii ■ 

resulting aoaress to tne iransfaoon ouner z# in me 
memory access unit 18. along with an OP READ 



request, at the end of the address generation 
stage* 

In stage Ave, the memory access unit 10 se- 
tects the address generated In stage four for ex- 
s ecutton. Using the translation buffer 24, the mem- 
ory ac cess unit 16 tra nsl a tes the virtual address to 

aakktMtLuJ ■ riiinri ill i ifn n Hi n n rl it mi * - - 'n If n ■ 

pnystcai address aurtng tne address translation 

stage. The physical address is then used to ad- 
dress the cache 22, which is read in stage six of 

ro the pipeBne. 

In stage seven of the pipeBne, the instruction Is 
issued to the ALU 44 which adds the two operands 
and sends the resuft to the retire unit 48. During 
stage 4. the register numbers for R1 and R2, and a 

rs pointer to the source list location for the memory 
data, are sent to the exec utio n unit and stored En 
the pointer queues. Then during the cache read 
stage, the execution urtit looks for the two source 
operands In the source DsL In this particular exam- 

20 pie, ft finds only the register data RO, but at the 
end of tide stags the memory data arrives and Is 
substituted for the Invalidated read-out of the regie 1 
tar file. Thus* both operands are available In the 
Instruction execution st age. 

28 In the retire stage eight of the pipeBne, the 
result data Is paired with the next entry in ths retire 
queue. Although several functional execution units 
can be busy at the same time, only one instruction 
Is retired In e single cycle. 

90 In the last stage nine of the illustrative pipeline, 
the data la written Into the QPR portion of the 
register files In both the exec uti on unit 20 and the 
I n s tr u cti on unK 18. 

Referring now to FIG. 3, a Mock diagram of the 

as virtual Instruction cache (VIC) 28 is Illustrated. The 
VIC 28 to constructed of four groups of selMfmed 
rams (8TRAMS), and acts as a window Into the 
main memory 14. In thb regard the VIC 28 Amo- 
tions Ln a simitar fasNon as the main cache 22. The 

40 first group of VIC STRAfttS to the data stram 50 
which provides s t or ag e space for the actual instruo* 
tion stream (STREAM) retrieved from the main 
cache 22. Specifically, the data stram 50 contains 

JMJ mhm mat mm ii I ■ ■ ntlnw mJIIk ii ■ mmm r>_. in In ■ 1 1 if ■ mm 

iuz4 storage tocaoons, wnn eacn storage rocaoon 
48 being 644)118 In width. Rom the size of the data 
stram 50, it should be apparent that the (STREAM 
is retrieved in quadword (8-byte) packets. Accord- 
ingly, the data path between the main cache 22 
and the VIC 28 la also 64-brte in width and a 
60 quadword of I8TREAM can be transferred during 
each system dock cycle. 

The PC 26 delivers bits 123 of the 32-bit 
virtual address to the date stram 80 In order to 
address each quadword of STREAM. Bite 20 are 
sb unnecessary, as they are only needed to address 
Individual bytes within each quadword* individual 

mm i. mtmrn. .tiifli. ■ Lm • • nm mm n nn.nnn ■ fjr_r Hi ■ mi. mm ■ ■ . 

oyte aooressiDBiy is not necessary tor tne proper 
operation of the VIC 28. Rather, the smallest Incite 
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merit of ISTREAM which can be addressed in the 
VIC 2 8 is a quadword. Further, the upper bits 
31:13 are not used to address the data stram 50 
because only 1024 quadword locations are avail* 
able for storing the ISTREAM. Accordingly, the 10- 
brts 1M am sufficient to provide a unique address 
for each of the 1024 data storage locations (i.e. 
2 ,C »1024). 

However, it should be dear that since the up- 
per bits 31:13 are not used to address the data 
stram 50. there are multiple quadwords which must 
be stored at identical data stram locations. For 
example, the quadword located at address 
11111111111111111110000000000 will be stored 
at the same data stram location as the quadword 
located at address 

01111111111111111110000000000. Both address- 
es share the same tower 10-btts and must there- 
fore, share the same data stram storage location. In 
fact each data stram location can host any one of 
1 .048,570 (21 '» » 1 .048578) quadwords. 

Accordingly, In order to determine which of 
theses quadwords is stored in each of the data 
stram locations, a set of tag strams 52 is provided. 
The tag strams 52 store the upper nineteen bits 
31:13 of the quadword address. However, 
ISTREAM is retrieved from the main cache 22 in 
four quadword blocks. In other words, a request to 
the main cache 22 for the first quadword in a block 
causes the main cache 22 to also return the three 
following quadwords. Retrieving ISTREAM in 
blocks satisfies the principle of locality in time and 
memory space and aids the overall performance of 
the VIC 28. Accordingly, the 1024 data stram loca- 
tions are identified by only 258 tag stram locations 
(1 for each four quadword block). Thus, the tag 
stram 52 contains 258 19-bit storage locations and 
frtfts (123) of the virtual address are sufficient to 
identify each of the 258 storage locations 
(2»=256). 

Operation of the VIC 28 is enhanced by the 
method used for retrieving ISTREAM from the main 
cache 22. The request for ISTREAM is always 
quadword aligned and can be for any quadword 
within a block. However, the main cache 22 only 
responds with the requested quadword and all sub- 
sequent quadwords to fill the block. Quadwords 
prior to the request in the block are not returned 
from the main cache 22. For example, if the VIC 28 
requests the third quadword In a block, only the 
third and fourth quadwords are returned from the 
main cache 22 and are written into the data stram 
50. This method of retrieving ISTREAM Is em- 
ployed for two reasons. First by returning the 
requested quadword fbst rather than the first quad- 
word in that block, the requested ISTREAM ad- 
dress is available Immediately and the critical re- 
sponse time Is enhanced. Second, performance 



models indicate that the remainder of the block is 
hardly used 

Since it is possible for only a portion of a block 
to be present in the data stram 50, it is necessary 
5 to keep track of which quadwords are valid. There- 
fore, a quadword vafid stram 54 is provided. A valid 
bit is maintained for each quadword In the data 
stram 50. The quadword valid stram 54 is or- 
ganized similar to the tag stram 52, in that it 

ro contains 258 4-bit storage loca t ions. Each storage 
location corresponds to a four quadword block of 
data stored in the data stram 50, with each of the 
four vafid bits corresponding to a quadword within 
the block. Thus, ike the tag stram 52. the quad- 
's word vald stram is addressed by the eight bits 
125 of the virtual address. 

Further, however, the individual quadword vafid 
bits must also be independently address a ble in 
order to determine H a particular ISTREAM quad- 

20 word requested by the BUFFER 30 is vafid. A 
multiplexer 58 is c on nect e d to the 4-bit output of 
the quadword vafid stram 54. The select input of 
tne muiupHjxer so is connected to quadword iden- 
tifying bits 4:3 of the vi rtual a ddress. For example, 

28 a request from the IBUFFER 30 for the quadword 
stored at location 

000000000000000000011 11111 101000 results in 
the four quadword vafid bits stored at location 
11111111 of the quadword vafid stram being defiv- 

30 ered to the multiplexer 58. Bits 43 of the virtual 
address indicate that the first quadword (location 
01) is the desired quadword. Thus, the select ines 
of the multiplexer 58 cause the quadword vafid bit 
corresponding to the s el ec te d quadword to be de- 

38 ftvered at the multiplexer output 

Finally, the fourth group of VIC strams 58 con- 
tains vafid bits for each block stored in the data 
stram 50. Thus, the block valid stram 58 contains 
258 1-bit storage locations and is addressed by 

40 bits 125 of the virtual address. Not only is it 
necessary for the VIC 28 to "know- which quad- 
words within a block are valid, but also, the VIC 28 
needs to verify that the block itself is vafid. At this 
time it is sufficient to understand that the block 

48 vafid bit must be set before the VIC 28 wBI allow 
the selected quadword to be transferred to the 
IBUFFER 30. However, it should be noted that the 
block vafid stram actualy consists of two sets of 
strams to speed operation of the VIC 28 during a 

so flush. At any given time, a selected one of the two 
sets of strams stores the block vafid bits which 
reflect the current status of the data In the VIC 28. 
The addressed block vafid bit rep r esenting the 
validity of the addressed block of data In the VIC 

sa 28, is selected by a multiplexer 238 as either the 
•BlOCK_A_VALID- bit from the first set of 
strams (set A), or the *BLOCK_B_VAUD* bit 
from the second set of strams (set B), This aspect 
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of the VIC 28 it dbcuM Od in greater detsl in 
oofyuRcoon wnn ins oescnpuon of tre operati on of 
the circuit shown In FIG. 9. 

During an IBUFFBI request for a ootoctpd 
quadword of ISTREAM. thervirtual address con- s 
tamed In the PC 28 is delivered to the VIC 2a The 
VIC 28 responds to the request by determining if 
the requested quadword is present in the data 
strain 50 and, if so. whether ft is vafld. Btts 31:13 of 
the PC virtual address ate defivered to one input of to 
a 19-bit comparator 6a The second input to the 
comparator 60 is connected to the output of the tag 
stram 52. Previously, bits 31rl3 of the address of 
the quadword stored In the data stram GO were 
stored in the tag stram 52. Therefore, those pre* rs 
vlousty stored bits 31:13 are presented as the 
second input to the comparator 60. If the two 
addresses match, the asserted output of the com- 
parator 60 is delivered as one Input to the 3-taput 
AND gate 62. At the same time, the block and 20 
quadword vaid bits are also defvered as inputs to 
the AND gate 62. AccortflngJy, if any of the three 
signals Is not asserted, the AND gate 82 produces 
a MI88 signal. Conversely, If all three signals are 
asserted, the AND gate 82 produces a HIT aignaL 28 
A MISS signal initiates a request to the main cache 
22. while e HJT signal causes the data STRAM 50 
to deflver the selec t ed quadword of data. 

The PC 28 is actually constructed of several 
separate program counters. During each, system so 
dock cycle, one of two PCs (PREFETCH PC or 
MTAQ) is selected and He virtual address is deOv 
ered to the VIC 2a Generally, the virtual address 
contained in the PREFETCH PC Is selected and 
delivered to the VIC 23. The PREFETCH PC at- » 
ways points to the next quadword that the IBUF 
FE B is jto fy to accept In sequential code the 
PREFETCH PC ia i n cre me n t ed by one quadword 
each time the BUFFER accepts STREAM from 
the VIC 2 a When the ISTREAM branches, the « 
PREFETCH PC Is loaded wfth the correct destine- 
oon aooress. 

Howe* v. when ISTREAM is requested from 
and delivered by the main cache 22. the virtual 
address contained in the MTAQ is selected and 48 
delivered to the VIC 2a When the VIC 28 receives 
multiple quadworda of ISTREAM from the main 
cache 22. the address of the VIC 28 must be 
increme n ted by a quadword In each cycle of the 
main cache response. The PREFETCH PC would so 
serve this purpose if the instruction decoder 32 
could always consume an of the ISTREAM as It 
arrives from the main cache 22. In practice this Is . 
not always possible. Therefore, a second Pa In- 
dependent from the PREFETCH PC. is used to ss 
store the 18TREAM in the VIC 2a Once the re- 
sponse from the main cache 22 to complete, the 
PREFETCH PC is again used to address the VIC 



2a The MTAQ Is loaded wfth the previous value of 
the VIC address when there is no request to the 
main cache 22. 

Referring now to FK3. 4, the I BUFFER 30 ia 
Wustrated The BUFFER 30 aSgns the data for 
decoding and performs the function of Increasing 
the process in g speed of the Instruction unit 18 by 
p re fa t dtf n g subsequent sequential Instructions. The 
(BUFFER 30 retrieves a selected quadword of the 
ISTREAM and positions that quadword. such that 
the instruction decoder 32 receives the instruction 
wfth the opco d e positioned In the zero byte loca- 
tion. In order to accompBsh this complex task of 
repositioning the ISTREAM. the (BUFFER 30 Is 
separated into five ma|or functional sections: IBEX 
84 & IBEX2 68, ROTATOR 68. SHIFTER 70. 
MERGE MULTIPLEXER 72. and IBUF 74. 

Rather than simply Increase the size of the 
Instruction decoder 32 to contain more bytes of the 
(STREAM, a pair of prefetching buffer* IBEX 64 
and IBEX2 68 are rjsposed Intermediate the de- 
coder 32 and the VIC 2a IBEX 64 and IBEX2 68 
are quadword buffers ftmctionafly posi tio ned be- 
tween the VIC 28 and the IBUF 74 and operational 
to retrieve the next sequential quadword of 
ISTREAM while the decoder 32 is operating on the 
present instruction. TMs prefetching normally Mdea 
the time required for a VIC access by performing 
the Instruction letch during the time in which the 
decoder 32 Is busy. Any one of the quadworda 
stored in the VIC 28 Is controllabfy storabie In the 
IBEX 84 and I BEX2 6a Ae discussed previously, 
the PREFETCH PC controls operation of the VIC 
28 to select and deflver a quadword of STREAM. 
The qu adword currently selected by the 
PREFETCH PC Is stored In the IBEX 64 whtie the 
next subsequent quadword of ISTREAM la re- 
trieved from the VIC 28 and stored in the 1BEX2 

sa 

The purpose of the IBEX 64 and IBEX2 68 Is to 
prefetch the subsequent two quadworda of 
ISTREAM and sequentially provide these bytes of 
(STREAM to AH the IBUF 74 as eech Instruction is 
consumed by the Instruction de coder 32. It should 
oe nono max me present computer system prefer- 
ably employs an Instruction set which Is of the 
variable length type. Accordingly, until the Instruc- 
tion decoder 32 actually dec o des the opcode of the 
in str uction, the number of bytes dedicated to the 
instant Instruction Is not "known" by the (BUFFER 
30. Therefore, the (BUFFER 30 doee not "know* 
how many bytes wffl be consumed by the instruo 
tion decoder 32 and win need to be refined by the 
BUFFER 3a Thus, the logic which controls the 
operation of the IBEX 64, IBEX2 6& and VIC 28 
must be capable of determining the number of 
bytes needed to fffl the decoder 32. which location 
or mutttpfo locations contain the desired bytes, and 
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whether those bytes are valid 

The control logic for operating the IBEX 84. 
IBEX2 66, and VIC 28 includes a multiplexer 78 
with control logic 78 operating the select inputs of 
the multiplexer 7a The IBEX 44, IBEX2 88. and 
VIC 28 each includes an 8-byte wide data path 
connected to the inputs of the multiplexer 78 such 
that any input may be selected by the control logic 
78 and delivered over an 8-byte wide data path to 
the rotator 68 and to the IBEX 84. The IBEX2 66 Is 
connected directly to the VIC 28 and receives the 
next sequential quadword of (STREAM over the 8- 
byte data path therebetween. Operation of the mul- 
tiplexer 76 and control logic 78 is discussed in 
greater detail in conjunction with the description 
accompanying RQ8. 8 and 10. 

The merge multiplexer 72. rotator 68 and shift- 
er 70 interact to maintain the 8-byte instruction 
decoder 32 filled with the next nine sequential 
bytes of (STREAM. As the decoder 32 completes 
the decoding stage of each instruction, those con- 
sumed bytes are shifted out and discarded by the 
shifter 70. The rotator 68 acta to provide the next 
sequential bytes of STREAM to replace those 
bytes which were discarded. In this manner, the 
instruction buffer 30 attempts to provide at least 
the next 9-bytes of JSTOEAM to the instruction 
decoder 32. Therefore, Independent of the length 
of the present instruction, the decoder 32 is as- 
sured that tor the majority of instructions (relatively 
few instructions require more than 9 bytes) the 
entire instruction is present end available for de- 
coding. 

The IBUF 74 is a 8-byte register for storing the 
results of the merge multiplexer 72 until the de- 
coder 32 Is available to accept the ISTREAM. Raw 
ther, the output of the IBUF 74 is also connected to 
the input of the shifter 70. 

Turning now to FIG. 5. the data paths to and 
from the instruction decoder 32 are shown in great- 
er detail, in order to simultaneously decode a num- 
ber of operand specifiers, the IBUF 74 is linked to 
the instruction decoder 32 by a data path 80 for 
conveying the values of up to nine bytes of an 
instruction currently being decoded. Associated 
with the eight bits of each byte is a parity bit for 
detecting any single bit errors in the byte, and also 
a vafid data flag for indicating whether the IBUF 74 
has, in fact been fiBed with data from the VIC 28 
as requested by the program counter 28. 

The instruction decoder 32 decodes a variable 
number of specifiers depending upon the particular 
opcode being decoded, the amount of valid data in 
the IBUF 74. and whether the downstream stages 
in the pipeline are available to accept more specifi- 
ers. Specifically, the instruction decoder 32 in- 
spects the opcode to determine the number of 
subsequent bytes which are associated with that 



particular instr uc tion. Then the decoder 32 checks 
the vafid data flags to determine how many of the 
a s soc i a t ed specifiers that can be decoded and then 
decodes these specifiers In a single cycle. The 

s instruction decoder 32 delivers a signal fraflcatlng 
the number of bytes that were decoded in order to 
remove these bytes from the IBUF 74. For exam- 
ple, if the opcode includes four bytes of associated 
specifiers, the decoder inspects the vaMd bytes to 

to ensure that these four bytes are vald and then 
decodes these specifiers. Thereafter, the decoder 
instructs the shifter 70 to remove the opcode and 
the consumed four bytes and move the upper four 
bytes Into the low order four byte locations. This 

rs shifting process is effective to move the next op- 
code into the zero byte location of the IBUF 74. 

The IBUF 74 need not be large enough to hold 
an entire instruction, so long as it may hold at least 
three specifiers of the kind which are typically 

20 . found in an instruction. The instruction decoder 32 
is somewhat simpftfied if the byte 0 position of the 
IBUF 74 holds the opcode while the other bytes of 
the instruction are shifted into and out of the IBUF 
74. In effect the IBUF 74 holds the opcode in byte 

M 0 and functions as a ftrst-in. firstcut buffer for byte 
positions 1 through a The instruction decoder 32 Is 
also simplified by the operating criteria that only 
the specifiers tor a single Instruction are d e code d 
during each cycle of the system dock. Therefore. 

so at the end of a cycle in which all of the specifiers 
for an instruction win have been decoded, the in- 
struction decoder 32 transmits a "shift opcode" 
signal to the shifter 70 in order to shift the opcode 
out of the byte 0 position of the IBUF 74 so that 

as the next opcode may be received In the byte 0 
position. 

The VIC 28 Is preferably arranged to receive 
and transmit instruction data in blocks of multiple 
bytes of data. The block size is preferably a power 

40 of two so that the blocks have memory addresses 
specified by a certain number of most significant 
bite in the address provided by the program coun- 
ter 2a For example, in the preferred embodiment 
each block consists of 32-bytes or four quedwords 

49 and Is addressed by a 32-bit address. Thus, bits 
31-5 are unique for each block. Further, owing to 
the Instructions being of variable length, the ad- 
dress of the opcodes within the ISTREAM occur at 
various positions within the block. To load byte 0 of 

so the IBUF 74 with the next opcode to be executed, 
which may occur at any byte position within a 
block of instruction data from the cache, the rotator 
88 Is disposed in the data path from the VIC 28 to 
the IBUF 74. The rotator 68. as well as the shifter 

as 70. are comprised of cross-bar switches. The data 
path from the VIC 28 Includes eight parallel bus- 
ses, one bus being provided for each byte of the 
ISTREAM. 
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In the general esse, it ie necessary to keep 
track of the number of vsBd bytes in the IBUF 74 
The number of vald bytes at any particular in- 
stance to kept in a register cafbd IBUF VAUD 
COUNT 81. The value of trfe register is the pre- 
vious IBUF VALID COUNT minus the number of 
bytes shifted plus the number of new bytes 
merged through MERGE MUX 72. Smflarty it is 
necessary to know how many bytes remain in IBEX 
84. Any bytes that have been moved into the BUT 
74 are considered Invalid. As IBUF 64 becomes fuO 
the remaining bytes from the quadword of date or a 
complete new quadword are stored In IBEX The 
number of valid bytes In BEX Is stored to a Vktuaf 
register called IBEX VALID COUNT. This Is not a 
hardware register but the output from combination- 
al togte that produces either, the previous BEX 
VAUD COUNT minus the number of bytes merged 
into the IBUF 74 if IBEX ie being aatected into 
MUX 78, or eight bytes minus the number of bytes 
merged into the IBUF 74 if IBEX 2 or VIC is 
se l e c ted Into MUX 78. 

At the beginning of a program or after a branch 
or jump Instruction is executed, R is deavabte to 
toad the IBUF 74 with entirely new data from the 
VIC 28. For (Ms purpose, combinational (ogle 82 
controlling the merge multiplexer 72 receives a 
BUF VALID COUNT of zero so that aD of the 
select Dnea SO-38 are not asserted and ths merge 
multiplexer 72 selects data from only the BO to B8 
inputs. Since none of the Instructions to Ins BUF 
74 are valid they are discarded, and only the new 
insBuctlons contained In ROTATOR 68 are pre- 
sented to the IBUF 74. 

In order to bad new BTREAM Into the BUF 
74 from the VIC 28, the MERGE MUX 72 Is used 
to select the number of bytes from the ROTATOR 
68 to be merged with a select number of bytes 
from the shifter 70. If the signal SHIFT OP H 
asserted the output of toe SHIFTER 70 wfll be the 
BUF 74 bytes 0 through 8 shifted down by the 
number to shift, otherwise if SHIFT OP Is not 
asserted the output of the shifter wfll be IBUF 74 
byte 0 in position AO with IBUF 74 bytes 1 through 
8 shifted down by ths number of bytes to shift 

Also when the IBUF 74 Is Initially loaded, there 
will be an offset between the address correspond- 
ing to the opcode in the date from VIC 26 to 
particular, this offset Is given by the least signifi- 
cant bets of the program counter 2a As ahown in 
FIG. 6 a quadword of BTREAM (eight bytes) is 
delivered to the ROTATOR 68, thus using the three 
least significant bite from the program counter 28 . 
as the rotate value the opcode byte is delivered to 
the BO Input of merge mux 72. For example, If the 
program branches to BOD 16 Le., the fifth byte of 
the second quadword to a block. The quadword 
1 18 BOB 16. the 



are 5. so when the V© provides ths quadword the 
ROTATOR 67 rotates by 5 bytes and deOvere byte 
6 to the BO input of MERGE MUX 72. 

to the general case, though, the rotate value is 
s calculated using the formula: 

rotate value • 8 - IBEX_VAUO COUNT- 
0BUF_VAUD_COUMT ~ 
- NO._BYTE8 TO SHIFT) 

For example, if mere are nine valid bytes to the 

ra IBUF 74 and three to BEX (bytes 5. 6, 7 of a 
quadword) and the number of bytes to shift Is two, 
the rotate value is minus two, therefore the rotator 
shifts up by two (as the result was negative). Thus, 
ths rotator 68 defivers byte S of the quadword in 

is BEX 64 to the B7 input on merge mux 72, and 
byte 6 to B8 (byte 7 is of no interest as it win not 
be merged, it is however, deSvered to the BO 
fopuO. Positive rotate values wfll cause the ROTA- 
TOR 68 to shift down. Thus, c omb i na gon sl logic 90 

a> controlling toe rotator 68 



The control tor toe MERGE MUX to combin- 
ational logic 82 produces Individual select Bras 80 
- SB tor the merge mux 72 such that the relevant 
at bytes from the SHIFTER and ROTATOR are deliv- 
ered to the BUF 74. If SHIFT OP Is not asserted 
then 80 always selects the AO input such that the 
opcode byte remains to byte 0 of the BUF 74. The 
remaining selects are calculated as toOowe: 
» MERGE_ VALUE ■ IBUF__VAUD__COUNT - 
NO._BYTE8_TO_SHIFT: any select (81-88) toss 
than MERGE_VALUE selects the SHIFTER 70. 
and the rest select the ROTATOR 68. 

For example, If (hare are eight valid bytes in 
as the BUF 74 and the number to shift is three, the 
merge value is five so SI, 82, S3, 34 select the 
output from the SHIFTER 70 but 88, 86. 87, 88 
select the output from the ROTATOR 88. 

8toce toe ROTATOR 68 receives eight bytes of 
40 date but transmits nine bytes to the MERGE MUX 
72, the nine bytes delivered to BO - B8 inputs ere 
never aa valid. The ninth byte gets the same data 
as the first byte but ft is only vaBcl when ths rotate 
value is negative. 
4S Once an opcode has been loaded Into the byte 
0 position of the BUF 74. ths Instruction decoder 
32 examines It and the other bytes to the BUF 74 
to determine whether it is possible to simulta- 
neously decode up to three operand specifiers, 
so The instruction decoder 32 further separates the 
source operands from the deetinaton operands, to 
particular. In a stogie cycle of the system dock, the 
Instruction decoder 32 may decode up to two 
source operands and one destination operand, 
ss Flags todteallng whether source operands or a des- 
tination operand are decoded tor each cycle are 
transmitted from toe instruction decoder 32 to the 
OPU34. 
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The instruction decoder 32 simultaneously de- 
codes up to three register specifiers per cycle. 
When a register specifier is decoded, Hs register 
address is placed on the transfer bus TO and sent 
to the source Bst queue 38 via rtransfer unit 82 in s 
theOPU 34. 

The instruction decoder 32 may decode one 
short literal specifier per cycle. According to the 
VAX instr uc tio n architecture, the short literal speci- 
fier must be a source operand specifier. When the ro 
instruction decoder 32 decodes a short iterai 
specifier, the short literal data is transmitted over a 
bus (EX) to an expansion unit 94 in the OPU 34. 

Preferably the instruction decoder 32 is ca- 
pable of decoding one complex specifier per cyde. rs 
The complex specifier data is transmitted by the 
instruction decoder 32 over a general purpose bus 
(QP) to a general purpose unit 98 in the OPU 34. 

Once all of the specifiers for the instruction 
have been decoded, the instruction decoder 32 20 
transmits the 'shift op" signal to the shifter 70. The 
instruction decoder and also transmits a micropro- 
gram "forte" address to a forte queue in the queues 
38. as soon as a valid opcode is received by the 
IBUF74. a 

Referring now to FIG. 8. a schematic diagram 
of the shifter 70 is shown. The Ao-A« byte inputs of 
the merge multiplexer 72 are illustrated connected 
to the 8-bH outputs of a bank of multiplexers which 
comprise the shifter 70. It should be remembered so 
that the purpose of the shifter 70 Is to move the 
unused portion of the Instruction stream contained 
in the IBUF 74 into those bytes of the IBUF 74 
which were previously consumed by the instruction 
decoder 32. For example, if. during the previous 3$ 
cyde. the instruction decoder 32 used the three 
lowest bytes (0. 1, 2) of the IBUF 74. then in order 
to properly present the next instruction to the de- 
coder 32. it is preferable to shift the remaining valid 
six bytes (3-8) into the low order six bytes of the 40 
IBUF 74. 

Accor di n g ly, the consumed low order bytes are 
no longer of any immediate use to the decoder 32 
and are discarded. Thus, the shifter 70 need only 
move high order bytes into tow order byte positions 4s 
and does not rotate the low order bytes into the 
high order byte positions. This requirement sfmpS- 
fies the shifter configuration tor the higher order 
bytes since each byte position only receives shift- 
ed bytes from those positions which are relatively so 
higher. For example, byte position six only receives 
shifted bytes from its two higher order positions (7 
and 8). while byte position one receives shifted 
bytes from its seven higher order positions (2-8). 

To better describe this process* the Internal ss 
configuration of one of the multiplexer banks is 
Illustrated and generally shown at 102. The mul- 
tiplexer bank 102 receives bytes 8. 7, and 8 from 



the IBUF 74 and delivers an output to the As Input 
of the merge multiplexer 72. Within the multiplexer 
bank 102 is a group of eight 3-input multiplexers 
102a-102h. The multiplexer 102a receives the zero 
bit of each of the input bytes 6, 7. and 8 at input 
locations 0, 1. and 2 respectively. Similarly, the 
multiplexers I02b-102h receive bits 1-7 respec- 
tively of the three input bytes. The select Ones for 
each of the multiplexers I02a-I02h is connected to 
the instruction decoder 32 and carries the 3-bit 
signal "number to shift". The "number to shift" 
signal is, of course, the number of bytes that were 
consumed by the instruction docodor 32. 

Therefore, it can be seen that the select fines 
of the multiplexers I02a-I02h act to deflver all 
eight bits of the selected byte. For example, if the 
decoder 32 consumes two bytes of the (STREAM, 
then the contents of the IBUF 74 are shifted by two 
bytes, such that byte eight is moved into sixth byls 
location. Accordingly, the "number to shift" signal 
is set to the value two. thereby selecting the third 
input to the multiplexers I02a-I02h. Thus, the byte 
eight position Is selected and defivered to the 
merge multiplexer input As. 

The internal structure of the re ma i ning mul- 
tiplexer banks 104-114 are substantially similar, 
varying only in the number of input bytes. The 
multiplexer bank 114 has an output co n nected to 
the A* Input of the merge multiplexer 72. The 
inputs to the multiplexer 114 include only bytes 7 
and 8 of the IBUF 74. The multiplexer bank 112 
has an output connected to the As Input of the 
merge multiplexer 72. The Inputs to the multiplexer 
112 include bytes 5, ft 7, and 8 of the IBUF 74. 
The multiplexer bank 110 has an output connected 
to the A« Input of the merge multiplexer 72. The 
inputs to the multiplexer 110 include bytes 4,8, 8. 
7, and 8 of the IBUF 74. The multiplexer txwik 108 
has an output connected to the As input of the 
merge multiplexer 72. The inputs to the muttptexer 
106 include bytes 3. 4, 5, ft 7, and 8 of the IBUF 
74. The multiplexer bank 106 has an output con- 
nected to the Aa input of the merge multiplexer 72. 
The inputs to the multiplexer 108 Include bytes Z 
3, 4, 5. ft 7, and 8 of the IBUF 74. 

The multiplexer bank 104 differs sfightiy from 
the other multiplexer banks, in that Its output is 
cfirectiy connected to the merge multiplexer 72 and 
also the zero byte position of the IBUF 74. The 
byte 2bto case is adcttonaUy complicated by a 
requirement that in addTOon to the shifter 70 being 
capable of moving any of the higher order bytes 
into the zero byte position, the shifter 70 must also 
be capable of retaining the current zero byte while 
the remaining bytes are shifted. TWs feature is 
desired be ca u se byte zero contains the opcode. 
Thus, if the specifiers extend beyond the length of 
the IBUF 74. then the consumed bytes must be 
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shifted out vkS new specifiers rotated in, but the 
opcode must remain untfl the entire instruction Is 
decode d . Accordtogly. the inputs to the multiplexer 
104 Include bytes 1, 4 3. 4. 5, 6. 7, and 8 of the 
IBUF 74. However, the output of the multiplexer s 
104 is deffvered to one Input of a bank of mufflptex- 
ere 116. The second input to the multiplexer bank 
118 is connected to the 2sro byte position of the 
IBUF 74. A single bit select fete is connected to the 
instruction decoder 32 through an OR gate 118, so ro 
that when the instruction decoder 32 issues either 
a "shift opcode - or an TO shift opcode" signal 
the select Ine is asserted and the output of the 
multiplexer 104 is detvered to the Ao input of the 
merge multiplexer 72. Otherwise, if neither of these ia 
signals Is asserted, then byte 0 is selected and 
delivered to the Ao Input of the merge multiplexer 
72. 

Referring now to Ra 7, there is shown a 
schematic diagram of the rotator 68. The Bo-Bt ao 
byte inputs of the merge multiplexer 72 are illus- 
trated as connected to the 8-btt outputs of a bank 
of muttptaxere which comprise the rotator 68 It 
should be remembered that the purpose of the 
rotator 68 ia to rotate the next quadword of as 
STREAM so that the merge multiplexer 72 cm fin 
the IBUF 74 with ihe vafid tow order bytes of the 
shifter 70 and the rotated high order bytee of the 
rotator 68 Further, unto the shifter (70 In HQ. 5). 
each of the multiplexer banks In the rotator 68 Is oo 
capable of delivering any of the input bytee at Ha 
output 

For example, H during the previous cycle, the 
instruction decoder 32 uses the three lowest bytes 
(0. 1. 2) of the BUF 74, then the shifter 70 moves » 
the remaining vaOd sbc bytes <34) into the low 
order sbc bytes (0*) of merge muRlptoxar Inputs 
Ao-Ai. Thus, the rotator 68 rotates its tow order 
three bytes into positions 6, 7, and 8 so that the 
merge multiplexer 72 can combine Ao-As end B*- 40 
Bt to fl« the IBUF 74. The low order three bytes 
available from the multiplexer 78 could be the low 
order three bytee of IBEX2 68 or the VIC 28 or any 
three consecuflve bytes of IBEX 84. 

To better describe this process, the Internal <s 
configuration of one of the multiplexer banks Is 
frustrated and generally shown at 132. The mul- 
tiplexer bank 132 receives bytes 0-7 from either 
the VW 28, IBEX 64, or IBEX2 68, as described in 
co nj unct io n with FW3S. 4, 9, and 10. The output of ao 
the multiplexer bank 132 Is defivered to the B« 
Input of the merge multiplexer 72. Within the mul- 
tiplexer bank 132 is a group of eight 8-taput mul- 
tiplexers I32a-I32h The multiplexer 132a receives 
the zero bit of each of the input bytes 0-7 at as 
muHfptexsr 132a Input locations 4-3 respectively. 
Similarly, the muffiptexerc 132b»132h receive bits 
1-7 respectively of all of the eight input bytes. TT» 



select lines tor each of the muttiptexers I32a-132h 
receives the 3-btt rotate value as described in. 
conjunction with RO. 5. The signal is, of course, 
ft* number of bytes positions that the (STREAM 
should be rotated to property fill the IBUF 74. 

ft can be seen that tf the rotate value ia se- 
lected to be a value of three by the rotator control 
logic 90, the muKfplexere 132a-132h will each se- 
lect the input located at position three. Accordingly, 
bite 0-7 of Input byte seven are selected and 
delivered to the B* input of the merge multiplexer 
72. Therefore, fat response to a request for a three 
byte rotate, the Input byte seven is delivered to 
byte pos it ion four. 

The remaining multiplexer banks 134-148 are 
substantially similar to the multiplexer bank 132, 
Offering only in the order in which the input bytes 
are connected to the multiplexer banks 132-148. 
For example, the same request for a three byte 
rotate causes multiplexer bank 140 to defiver the 
sixth input byte to byte position three (B»). 

Consider now the combined affect of the op- 
eration of the rotator 68 and shifter 70. Assume 
both IBUF 74 and IBEX 64 are fUU Also assume 
that the decoder 32 has consumed the tow order 
three bytes of the IBUF 74. The decoder 32 pro- 
duces a value of three as the "number to shift" 
signal. The shifter 70 responds to this signal by 
relocating the t STREAM ao that positions Ao-At of 
the merge multiplexer 72 respectively receive posi- 
tions 3, 4, 5. 6, 7. 8, 6, 7. 8. At the same time the 
rotator control logic 90 delivers the rotate value to 
the rotator 6& The rotate value Is set to the value 
minus sbc Accordingly, the rotator 68 rotates Its 
contents so that p o s itions Bo-Bt of the merge 
multiplexer 72 respectively receive positions 3. 4, 
5, 6, 7, 8, 0. 1, Z Therefore, the merge multiplexer 
successfully combines the two Inputs to deflver the 
next nine bytes of STREAM to the IBUF 74 by 
selecting Inputs Ao-fc and BVBg. 

Referring now to RQ. 8. there Is shown a 
schematic diagram of the merge multiplexer 72 
and merge multiplexer control logic 82. It should be 
remembered that the merge multiplexer 72 op- 
erates under control of the logic 82 to select the 
next nine bytes of ISTREAM from the two sets of 9 
byte Inputs from the rotator 88 and shifter 70. 
Generally, the low order bytes are s e lected from 
the shifter 70 while the rotator 68 fills the remaining 
rngn oroer oyte positions. 

The control logic 82 receives the "number to 
shift" signal (m) wxJ the IBUF VAUO COUNT ml 
uses the values of those signals to select the 
proper input bytee. 

ine merge mumpisxer rz m e m oes nine oanxo 
of mufflptexero ISO, 182. 154, 198. 188, 160, 182, 
184, 168 wfft each bank receiving two byte posi- 
tion inputs, one byte each from the rotator 68 and 
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shmer 70. Thus, the select line c onn ecte d to each 
bank of multiplexers is asserted to select the rota- 
tor input and unasserted to select the shifter if***. 

To better describe this process, the internal 
configuration of one of the multiplexer banks is 
illustrated and generally shown at 150. The mut» 
tiptexer bank ISO receives bits 0-7 from the zero 
byte position of both the shifter 70 (Aurfo?) and 
rotator 68 (Boo-Bo;). The output of the multiplexer 
bank 150 is delivered to the zero byte position of 
the tBUF 74. Contained within the multiplexer bank 
ISO is a group of eight 2-rnput multiplexers 150a- 
150h. The multiplexer 150a receives the zero bit of 
both of the zero position input bytes such that an 
asserted value on the select fine delivers Boo and 
an unasserted value delivers Aoo. Similarly, the 
multiplexers 150tH50h receive bits 1-7 respec- 
tively of berth of the input bytes. The select Bnes tor 
each of the multiplexers I50a-I50h receives a 1-bit 
select signal from the priority decoder 82 in order 
to commonly defter all eight bits of the selected 
byte to the zero input position of the IBUF 74. 

Within the control logic 82. the "number to 
shift" signal (m) is subtracted from the IBUF VALID 
COUNT to determine the lowest order byte position 
into which the rotator inputs should be delivered. 
The signal m is delivered to a Is complement 
generator 168 to convert the signal m into a nega- 
te value. The signal -m is delivered to an adder 
170 which performs the arithmetic operation and 
delivers the result to a 4:16 decoder 172. Accord- 
ingly, the lower order nine output bite of the de- 
coder produce a single asserted signal at the 
numeric position corresponding to the lowest order 
byte position into which the rotator Inputs should 
be delivered. Therefore, this asserted byte position 
and an higher order byte positions should be as- 
serted to property select rotator inputs at the cor- 
responding multiplexers. 

For example, as discussed previously, if the 
"number to shift" signal Is set to a value of three, 
then the rotator Inputs should be selected for byte 
positions 8 through & The output of the decoder 
172 asserts only the One corresponding to byte 
position 8. Thus, a bank of OR gates 174 are 
co n nected to the outputs of the decoder 172 to 
provide asserted signals to the multiplexers cor- 
responding to the asserted line and all higher order 
byte positions. 

During normal operation the "number to shift" 
signal controls the operation of the merge mul- 
tiplexer 72. However, at the beginning of a program 
or at a context switch, the "number to shift" signal 
is zero and the IBUF VALID COUNT Is zero and 
the entiie contents of the rotator 68 are loaded Into 
the IBUF 74. Therefore, the output of the adder 170 
is zero, enabling ail of the outputs of the bank of 
OR gates 82. Thus, the select fines to the mul- 



tiplexers 150-168 all act to select the B inputs and 
pass the. entire contents of the rotator to the IBUF 
74. 

The control logic 78 for operating the mul- 
s tipiexer 78 of FIG. 4 selects either IBEX 64. IBEX2 
66 or VIC 28 according to the following priority 
scheme. 

The control logic 78 selects IBEX 64, IBEX2 88 
or VIC 28 with a simple priority algorithm. If IBEX 
re is not empty then IBEX 64 is delivered to the 
ROTATOR 68 otherwise if IBEX2 is vafid it is 
delivered to the rotation 88 and if both IBEX is 
empty and IBEX2 is not valid VIC data is deflvered 
to the ROTATOR 68. 
rs IBEX is loaded each cycle with the data deliv- 
ered by MUX 78 but it Is marked empty either on a 
FLUSH or when all valid data on the ROTATOR 88 
is consumed by the IBUF 74. In other words. IBEX 
VALID COUNT becomes non-zero when MUX 78 
20 provides data to ROTATOR 68 that cannot find a 
place in IBUF 74. For example, after a branch or 
lump instruction has been executed IBUF 74, IBEX 
84 and IBEX 2 are cleared (FLUSHED) and the VIC 
Is accessed for the new (STREAM. Assume it 
a branches to the first byte of a Mode that lain the 
VIC 28. The first quadwortf from the VIC 28 is 
presented to MUX 76 this passes tfte data through 
the ROTATOR 68 and MERGE MUX to IBUF 74. 
IBEX is loaded with the data but is not marked 
so vafid as all eight bytes went into the IBUF 74. m 
the following cycle the VIC 28 presents the second 
quadword to MUX 78 which passes it to the ROTA- 
TOR 68. Now assuming the DECOOER 32 decodes 
(ess than eight bytes, say four bytes, the SHIFTER 
as 70 shifts out 4 bytes, the ROTATOR 88 rotates by 
four and the MERGE MUX 82 passes tour bytes 
from the shifter 70 and five bytes from the ROTA- 
TOR 68 then IBEX contains three unused bytes of 
I8TREAM, so IBEX VALID COUNT Is set to three. 
40 IBEX2 can be considered stsfl buffer for the 
VVC 2a Because of the pipelined nature of creatfog 
a new prefetch address, accessing the VIC stroma 
then checking for a VIC HIT it is Impractical to stop 
this process as soon as IBEX contains some valid 
48 bytes. Thus data from the VIC 28 is loaded into 
IBEX2 68 the cycle after IBEX 64 is loaded with 
some vafid data and IBEX2 68 is marked valid if it 
ia a VIC HIT. Taking the above example, where a 
branch to the first byte of a vafld block In the VIC 
so 28 is executed. The a ddress of the first quadword 
is moved to PREFETCH PC in the first cycle. In 
the second cycle the first quadword is deflvered to 
IBUF 74 and PREFETCH PC moves on to the 
second quadword. In the third cycle, the second 
« quadword la delivered to BUF 74 and IBEX 84 and 
the PREFETCH PC moves to the third quadword. 
In the fourth cycle, assuming DECODER 32 con- 
sumes no more bytes, the third quadword la deBv- 
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erBd to IBEX2 and PREFETCH PC moves to tho 
fourth quadword tnd 15 we decide to staJL to the 
fifth cycle the VIC 28 delivers the fourth quadword 
to MUX 78 but IBEX 64 date Is passed to the 
ROTATOR 68. ^ 5 

As can be seen to the above example, 
prefetching of STREAM can move si gnifican tly 
ahead of the instruction in the ©UR One benefit of 
the VIC 28 is that accesses to the main cache 22 
aie significantly reduced. However. tWs benefit will io 
be severely reduced tf prefetching continues too far 
ahead of the decoded instruction stream. On aver- 
age, a branch Instruction occurs once In every 
sixteen bytes of ISTREAM so it is essential that 
prefetching does not access the main cache 22 rs 
unless there is a reasonable chance the data wfl 
be used Thus, a request to the mabi cache tor 
data is only made if there is a VIC MI88, KBEX2 Is 
not vald and IBEX Is empty. TWs usually mams 
seven or eight bytes are atffl available to the D& ao 
CODER 32 whan the request tor a VIC block la 
made. 

Referring now to RQ. 0. there la shown a Mode 
diagram of the twounft vafid block store strain 58 
of the virtual Instruction cache 28. Since the VIC 28 25 
is a virtual cache, it must be flushed on a context 
switch or R0 Instruction. In other words, afl 258 of 
toe 1-Wt storage locations must be marked as 
invalid. Unfortunately, only one storage location can 
be mated ae Invalid during each dock cycle, ao 
Accordingly. « is possible that If afl 258 bite are set 
to their vafid condition, then ft takes 288 dock 
cycles to dear the block vaBd strain 58. 

Aa shown In RQ. 9, there are two block vald 
strams 220. 222 (BVSA, BV8B). One of the strains as 
is used to determine if the presently requested 
address "Ms* or ■misses" to the VW 21 While the 
first strain is determining hMnbo the second stram 
Is being cleared at the rate of one storage location 
during each dock cyde. Therefore, assuming that 40 
258 cycles have elapsed since the last context 
switch, then the second stram Is daar and a con- 
text switch la accomplished in only a stogie cyde 
by switching the functions of the two strams. ft 
should be appreciated that each stram 220, 222 Is 4s 
configured to perform either hfttniss determination 
or vafid bit clearing. In fact each context switch 
causes BVSA and 8V8B to switch to the opposite 
function. 

BVSA and BVSB each receive a single 8-bit so 
address from respective muJttptexsrs 224, 228. 
Both of the multiplexers 224, 228 receive a pair of 
addresses from toe PC 28 and a reset control 228. 
In order to present the PC address to one of the 
strams 220, 222 and the reset address to the other as 
stan 220, 222. the select fines to the multiplexers 
224, 22S are operated In a complemented fashion. 

The reset control 228 receives a CONTEXT 



SWITCH signal from the execution unit 20 and 
begins to sequentfatty present address 0-2S5 to the 
muMptooors 224, 228 One of the muUpteooere 224. 
228 paaee a these sequential addresses to the se* 
lected strams 220, 222. such that the 258 vald bite 
contained therein are reset over a period of 258 
dock cycles. 

to order to prevent the execution unit from 
Inflating a context switch before one of the strams 
220. 222 is reset the reset control defivers a hand- 
shaking signal to indicate that the reset process is 
complete. An S-R Rip flop 230 receives the hand- 
shaking signal at Kb set input, causing toe Hp flop 
230 to latch a PROCEED WITH CONTEXT 
SWITCH SIGNAL to the execution unit 20. The 
SWITCH CONTEXT signal from the execution unit 
20 is 8tao connected to toe reset Input of the ftp 
Hop 230 so that the PROCEED WITH CONTEXT 

SWITCH signal is reset at the beginning of each 

— - » - ■» - •» * 
oonpgxi swum. 

Control of the select tnea to the mumptexere 
224, 228 is provided by a J-K (Bp flop 232 which 
togg les between asserted and unasserted to rs> 
sponse to each CONTEXT SWITCH signal. Both 
Inputs of the flip fop 232 are connected to a logical 
•1* and the dock input la connected to the CON- 
TEXT SWITCH signal. Thus, the Q output (USE 
BLOCK B) of the flip-flop 232 switches between 
"0" and "1" bi response to a transition in the 
8WITCH CONTEXT signal The select input of the 
multiplexer 224 Is co nn ected directly to the Q 
output of the flip-flop 232, while the select input of 
the multiplexer 226 is connected to ths Q output of 
the fft>flop 232 through an Inverter 234. 

to a similar fashion the block vafid data 
(MARKER BLOCK VALID) from the PC unit (28 to 
RQ. 1) is multiplexed between the data inputs of 
the strams 220, 222 In response to the USE 
BLOCK B 8JQNAL For this purpose, the date Input 
of the "B* stram 222 is connected to the MARKER 
BLOCK VALID line through an AND gate 237 which 
Is enabled by the U8E BLOCK B signal, and the 
da!a Input of the "A* stram 220 is connected to the 
MARKER BLOCK VALID fine through an AND gate 
enabled by the complement of the USE BLOCK B 
signal as provided by an Inverter 238. Therefore, 
when the USE BLOCK B signal is asserted, the 
MARKER BLOCK VALID data is fed into the 'B" 
stram 222 white the "A" stram receives zero data 
and la therefore cleared. Conversely, when the 
USE BLOCK B signal Is not asserted, the MARKER 
BLOCK VALID data Is fed into the s A a stram 222 
while toe "B* stram receives wo data and is 
u mi mure cwerao. 

Finally, the vafid bit outputs of the strams 220, 
222 are co nn ec te d to a pair of inputs to a mut» 
tipiexer 238. The select ine of the multiplexer 238 
Is also connected to the Q output of the tip flop 



13 



25 



EP0 380 864 A2 



232 to operate in conjunction with the multiplexers 
224. 226. Accordingly, the stram 220. 222 which is 
selected to receive the PC address is also selected 
to deliver its output as the BLOCK VALID BIT. 



Claims 

1. An instruction buffer system for a digital 
computer for controlling the delivery of instruction 
stream bytes to an instruction decoder (32) capable 
of simultaneously decoding a variable number of 
instruction bytes; the system being characterised 
by: 

(1) an instruction buffer (74) having multiple 
byte locations for receiving the instruction bytes to 
be decoded: 

(2) first (64) and second (68) prefetch buffers 
for storing a pr eselected number of subsequent 
bytes of the instruction stream; 

(3) means (72) for refilling the Instruction 
buffer (74) with a selected number of sequential 
bytes of the instruction stream retrieved from at 
least one of the first (64) and second (66) prefetch 
buffers; 

(4) means for refBQng the first prefetch buffer 
(64) with sequential bytes of the Instruction stream 
wnen me nrsi preieicn Durrer is emptied; ana 

(5) means for refilling the second prefetch 
buffer (66) with sequential bytes of the instruction 
stream when the second prefetch buffer Is emptied. 

2. An instruction buffer system, as claimed in 
Claim 1, including means for delivering a signal 
responsive to the number of bytes of Instruction 
stream contained in an instruction currently being 
decoded, and further including a shifter (70) for 
receiving the said signal and shifting the contents 
of the instruction buffer (74) by a number of bytes 
representative of the said number. 

3. An instruction buffer system, as claimed in 
Claim 1 or Claim 2 wherein the instruction buffer 
refilling means (72) includes means for retrieving 
sequential bytes of the instruction stream from one 
of the first (64) and second (66) prefetch buffers to 
fill the buffer locations from which instruction 
stream bytes have been removed. 

4. An instruction buffer system, as claimed in 
Claim 2 or Claim 3, wherein the instruction buffer 
refiling means (72) includes means (68) for receiv- 
ing the sequential bytes of instruction stream re* 
trieved from the first (64) and second (66) prefetch 
buffers and rotating the bytes by a preselected 
number of byte l ocat i ons responsive to the number 
of bytes indicated by the said signal* 

6. An instruction buffer system, as claimed in 
any one of Claims 2 to 4, wherein the means for 
retlRing the first (64) and second (68) prefetch 
buffers Is operative In response to the absence of 



the said signal 

6. An instruction buffer system, as claimed In 
any one of the preceding claims including means 
for receiving the instruction dream from a virtual 

s instruction cache (28) in response to both of the 
first (64) and second (66) prefetch buffers having 
been emptied of inst r uc tio n stream bytes. 

7. An inst ru ction buffer system for a digital 
computer for controlling the delivery of an instruc- 

to tion stream to an instruction de coder (32) capable 
of simultaneously decodng a variable number of 
instruction bytes, the decoder (32) having means 
. for deivering a signal responsive to the number of 
bytes of instruction stream being d ecoded; the 

r s system being characterised by? 

(1) an I n struction buffer (74) for maintaining a 
p resel ec ted number of the next required sequential 
bytes of instruction stream, and means tor do- 
Hvertng the said preselected number of instruction 

20 stream bytes to the decoder; 

(2) first means tor prefetching and maintain* 
ing In a first prefetch buffer (64) a preselected 

of sequential bytes of the instruction 



98 (3) second means for prefetching and main* 

taining & a second prefetch buffer (08) a preselec- 
ted number of the next sequential bytes of the 
instruction stream subsequent to the bytes of In- 
struction stream maintained in the first prefetch 

30 buffer (64): 

(4) a shifter for receiving the sadd signal and 
shifting the contents of the Instruction buffer (74) 
by a preselected number of bytes responsive to 
the said signal, and delivering the shifted bytes to 

38 the instruction buffer; 

(5) means for retrieving sequential bytes of 
the instruction stream from one of the first (64) and 
second (66) prefetch buffers to fill the instruction 
buffer locations from which bytes of the instruction 

40 stream have been removed; 

(8) means for refilling the first (64) prefetch 
buffer with subsequent instruction stream bytes in 
response to the first prefetch buffers (64) being 
emptied; and 

48 (7) means for refilling the second (66) 

prefetch buffer with subseq ue nt instruction stream 
bytes in respon s e to the second prefetch buffer 
(68) being emptied. 

a An instruction buffer system, as claimed in 

so Claim 7, wherein the Instruction buffer refiffing 
means includes means tor receiving the sequential 
bytes of instruction stream retrieved from the first 
(64) and second (66) prefetch buffers and rotating 
the bytes by a preselected number of byte toe** 

88 tions in response to the said signal. 

9. An instruction buffer system, as claimed In 
Claim 7 or Claim 8 wherein the means for refilling 
the first (64) and second (88) prefetch buffers are 
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operative in response to the absence of the said 

• - -» 
8tgn8L 

10. An instruction buffer system* es claimed In 
any one of Claims 7 to 9, wherein the means for 
retrieving to H Vie I ne tr uCt iorf buffer (74) includes e 
means for retrieving bytes of the instruction stream 
from a cache memory (28) in response to the first 
(84) end second (86) prefetch buffers having been 

- — m n rt J 

ompooo. 

11. A virtual instruction instruction cache (28) to 
for a digital computer arranged to store a selected 
portion of an instruction stream therein and being 
adapted to replace the said s e lect ed portion with 
another portion of the instruction stream in r^ 
sponse to a context switch, (he virtual instruction is 
cache being organised into blocks of a proeolocted 
number of bytes of the instruction stream, each of 

the blocks having ass o ciat ed therewith a vaM bit 

nihil fiftti tin * ** » - 14,^1 - «■ * - - » _ — - _ _ _ ■ mM 

wrocn is set to noma max ac net a portion or tne 
i ns tru ct ion stream bytes stored fat that block are 20 
velds characterised in that the vaBd bits are or* 
ganisad and stored in a vald bit RAM which com- 

(1) first and second valid bit stores (220, 
222). each having a prea otoct ad number of storage as 

(2) means (228, 224) for delverfng a 

(3) means (236) tor retrieving the vafld bit 

■ n 11 il al Hi ■ 1 » - -* » » 

storeo anna presewcteo aooresss 30 

(4) mem for resetting all of the vafld bits 
sorea m me omsr vaip on suvsc 

(9) means tor alternately selecting the first 
valid bit store and the second vaBd bit store to be 
the said one valid-bit store in response to a context as 
swncn. 

12. A virtual instr uc tion cache, as claimed to 
Claim 11, wherein the resetting means (228) In- 
cludes means tor delivering a reset signal in r& 
spons e to afl of the sto rage l oca t ions being reset 4> 
ana means ror Denying tne comext swncn tn re* 
sponse to the absence of- the reset signal* 

13. A virtual instruction cache* as claimed in 
Claim 11 or Claim 12, wherein the means tor 
alternately selecting includes first (224) and second *s 
(226) multiplexors each having first and second 
Inputs r es pe ct ively connected to the pr eselect ed 
address delivering mesne and the resetting means 
(228), an output connected to the first and second 
valid bit stores, the first muUplexer (224) having a so 
select input connected to means (232) for after- 
natoly cycling between an asserted and unasserted 
state In response to a context switch, and the 
second multiplexer (226) having a select input coo 
nected through en Inverter (234) to the aMemateiy sa 
cycling means (232). 

1*. a rujum instruction cacne, as oatmeo tn 
Claims 12 or Claim 13, wherein, the means tor 



alternately selecting includes an output multiplexer 
having first and second inputs r esp e cti v el y con* 
nected to the outputs of the first and second vaBd 
bit stores (226; 224). an output and a select input 
co nn ec te d to toe means (232) tor alternately cy- 
cling between en asserted and unasserted state In 
response id a context swncn. 
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